Multiple Pattern Matching in LZW Compressed Text

نویسندگان

  • Takuya Kida
  • Masayuki Takeda
  • Ayumi Shinohara
  • Masamichi Miyazaki
  • Setsuo Arikawa
چکیده

In this paper we address the problem of searching in LZW compressed text directly, and present a new algorithm for finding multiple patterns by simulating the move of the Aho-Corasick pattern matching machine. The new algorithm finds all occurrences of multiple patterns whereas the algorithm proposed by Amir, Benson, and Farach finds only the first occurrence of a single pattern. The new algorithm runs in O(n+m+ r) time using O(n+m) space, where n is the length of the compressed text, m is the length of the total length of the patterns, and r is the number of occurrences of the patterns. We implemented a simple version of the algorithm, and showed that it is approximately twice faster than a decompression followed by a search using the Aho-Corasick machine.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Shift-And Approach to Pattern Matching in LZW Compressed Text

This paper considers the Shift-And approach to the problem of pattern matching in LZW compressed text, and gives a new algorithm that solves it. The algorithm is indeed fast when a pattern length is at most 32, or the word length. After an O(m + |Σ|) time and O(|Σ|) space preprocessing of a pattern, it scans an LZW compressed text in O(n + r) time and reports all occurrences of the pattern, whe...

متن کامل

Tying up the loose ends in fully LZW-compressed pattern matching

We consider a natural generalization of the classical pattern matching problem: given compressed representations of a pattern p[1. . M ] and a text t[1. . N ] of sizes m and n, respectively, does p occur in t? We develop an optimal linear time solution for the case when p and t are compressed using the LZW method. This improves the previously known O((n + m) log(n + m)) time solution of G asien...

متن کامل

A Unifying Framework for Compressed Pattern Matching

We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW), byte-...

متن کامل

Almost Optimal Fully LZW-Compressed Pattern Matching

Given two strings: pattern P and text T of lengths jPj =M and jT j = N . A string matching problem is to nd all occurrences of pattern P in text T . A fully compressed string matching problem is the string matching problem with input strings P and T given in compressed forms p and t respectively, where jpj = m and jtj = n. We present rst, almost optimal, string matching algorithms for LZW-compr...

متن کامل

Beating O(nm) in approximate LZW-compressed pattern matching

Given an LZW/LZ78 compressed text, we want to find an approximate occurrence of a given pattern of length m. The goal is to achieve time complexity depending on the size n of the compressed representation of the text instead of its length. We consider two specific definitions of approximate matching, namely the Hamming distance and the edit distance, and show how to achieve O(n √ mk) and O(n √ ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998